# 11 February 2014 -- Computer Architectures -- part 2/2

Name, Student ID .....

#### **Ouestion 1**

Considering the following processor architecture for a superscalar MIPS64 processor implemented with multiple-issue and speculation:

- issue 2 instructions per clock cycle
- jump instructions require 1 issue
- handle 2 instructions commit per clock cycle
- timing facts for the following separate functional units:
  - i. 1 Memory address 1 clock cycle
  - ii. 1 Integer ALU 1 clock cycle
  - iii. 1 Jump unit 1 clock cycle
  - iv. 1 FP multiplier unit, which is pipelined: 10 stages
  - v. 1 FP divider unit, which is not pipelined: 8 clock cycles
  - vi. 1 FP Arithmetic unit, which is pipelined: 2 stages
- Branch prediction is always correct
- There are no cache misses
- There are 2 CDB (Common Data Bus).
- Complete the table reported below showing the processor behavior for the 2 initial iterations of the reported loop-based program.

| # iteration |                | Issue | EXE | MEM | CDB x2 | COMMIT x2 |
|-------------|----------------|-------|-----|-----|--------|-----------|
| 1           | l.d f1,v1(r1)  |       |     |     |        |           |
| 1           | l.d f2,v2(r1)  |       |     |     |        |           |
| 1           | l.d f3,v3(r1)  |       |     |     |        |           |
| 1           | div.d f4,f1,f2 |       |     |     |        |           |
| 1           | s.d f4,v4(r1)  |       |     |     |        |           |
| 1           | mul.d f5,f1,f2 |       |     |     |        |           |
| 1           | div.d f2,f1,f3 |       |     |     |        |           |
| 1           | add.d f1,f5,f2 |       |     |     |        |           |
| 1           | s.d f1,v5(r1)  |       |     |     |        |           |
| 1           | daddui r1,r1,8 |       |     |     |        |           |
| 1           | daddi r2,r2,-1 |       |     |     |        |           |
| 1           | bnez r2,loop   |       |     |     |        |           |
| 2           | l.d f1,v1(r1)  |       |     |     |        |           |
| 2           | l.d f2,v2(r1)  |       |     |     |        |           |
| 2           | l.d f3,v3(r1)  |       |     |     |        |           |
| 2           | div.d f4,f1,f2 |       |     |     |        |           |
| 2           | s.d f4,v4(r1)  |       |     |     |        |           |
| 2           | mul.d f5,f1,f2 |       |     |     |        |           |
| 2           | div.d f2,f1,f3 |       |     |     |        |           |
| 2           | add.d f1,f5,f2 |       |     |     |        |           |
| 2           | s.d f1,v5(r1)  |       |     |     |        |           |
| 2           | daddui r1,r1,8 |       |     |     |        |           |
| 2           | daddi r2,r2,-1 |       |     |     |        |           |
| 2           | bnez r2,loop   |       |     |     |        |           |

# 11 February 2014 -- Computer Architectures -- part 2/2

Name, Student ID .....

### **Question 2**

Considering a 2-bit saturating counter BHT of 1K entries, and assuming that the processor executes the following code fragment, determine the BPU final state and calculate the final misprediction ratio in the presented case. The BPU initial state is indicated in the table.

#### General assumptions:

- R10 is the main loop control register and it is initialized to 100, then, the program iterates 100 times.
- R3 is the reference value, set to 1
- R2 is the input register
  - o the input values for R2 is the sequence of integer numbers starting from 0 (in the first iteration) to 99 (during the last iteration), i.e., [0,1,2,3,4,5...99]
- The grayed instructions in the program do not contain any branch or jump instruction

| Address | Instruction |                            | BHT (2-bit) | Prediction | misP. counter |  |
|---------|-------------|----------------------------|-------------|------------|---------------|--|
| 0x0000  | L0:         |                            | 3           | Т          |               |  |
|         | ;           | Reading input values in R2 | 3           | Т          |               |  |
| 0x0020  |             | AND R1, R2, R3             | 3           | Т          |               |  |
| 0x0024  |             | BEQZ R1, L1                | 3           | Т          |               |  |
| 0x0028  |             |                            | 3           | Т          |               |  |
| 0x002C  | L1:         | XOR R4, R1, R3             | 3           | Т          |               |  |
| 0x0030  |             | BEQZ R4, L2                | 3           | Т          |               |  |
| 0x0034  |             |                            | 3           | Т          |               |  |
| 0x0038  | L2:         | AND R5, R1, R3             | 3           | Т          |               |  |
| 0x003C  |             | BEQZ R5, L3                | 3           | T          |               |  |
| 0x0040  | L3:         |                            | 3           | Т          |               |  |
| 0x0050  |             | DADDI R10, R10,#-1         | 3           | Т          |               |  |
| 0x0054  |             | BNEZ R10, L0               | 3           | Т          |               |  |
|         |             |                            | 3           | T          |               |  |